Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Recognizing the promise of natural language interfaces to databases, prior studies have emphasized the development of text-to-SQL systems. Existing research has generally focused on generating SQL statements from text queries, and the broader challenge lies in inferring new information about the returned data. Our research makes two major contributions to address this gap. First, we introduce a novel Internet-of-Things (IoT) text-to-SQL dataset comprising 10,985 text-SQL pairs and 239,398 rows of network traffic activity. The dataset contains additional query types limited in prior text-to-SQL datasets, notably, temporal-related queries. Our dataset is sourced from a smart building’s IoT ecosystem exploring sensor read and network traffic data. Second, our dataset allows two-stage processing, where the returned data (network traffic) from a generated SQL can be categorized as malicious or not. Our results show that joint training to query and infer information about the data improves overall text-to-SQL performance, nearly matching that of substantially larger models. We also show that current large language models (e.g., GPT3.5) struggle to infer new information about returned data (i.e., they are bad at tabular data understanding), thus our dataset provides a novel test bed for integrating complex domain-specific reasoning into LLMs.more » « lessFree, publicly-accessible full text available May 1, 2026
-
Mobile apps are widely used and often process users’ sensitive data. Many taint analysis tools have been applied to analyze sensitive information flows and report data leaks in apps. These tools require a list of sources (where sensitive data is accessed) as input, and researchers have constructed such lists within the Android platform by identifying Android API methods that allow access to sensitive data. However, app developers may also define methods or use third-party library’s methods for accessing data. It is difficult to collect such source methods because they are unique to the apps, and there are a large number of third-party libraries available on the market that evolve over time. To address this problem, we propose DAISY, a Dynamic-Analysis-Induced Source discoverY approach for identifying methods that return sensitive information from apps and third-party libraries. Trained on an automatically labeled data set of methods and their calling context, DAISY identifies sensitive methods in unseen apps. We evaluated DAISY on real-world apps and the results show that DAISY can achieve an overall precision of 77.9% when reporting the most confident results. Most of the identified sources and leaks cannot be detected by existing technologies.more » « less
An official website of the United States government
